Search CORE

184 research outputs found

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Author: Pandey Gaurav
Whalen Sean
Publication venue
Publication date: 19/09/2013
Field of study

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

arXiv.org e-Print Archive

Crossref

Structural Drift: The Population Dynamics of Sequential Learning

Author: Crutchfield James P.
Whalen Sean
Publication venue
Publication date: 01/01/2012
Field of study

We introduce a theory of sequential causal inference in which learners in a chain estimate a structural model from their upstream teacher and then pass samples from the model to their downstream student. It extends the population dynamics of genetic drift, recasting Kimura's selectively neutral theory as a special case of a generalized drift process using structured populations with memory. We examine the diffusion and fixation properties of several drift processes and propose applications to learning, inference, and evolution. We also demonstrate how the organization of drift process space controls fidelity, facilitates innovations, and leads to information loss in sequential learning with and without memory.Comment: 15 pages, 9 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/sdrift.ht

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Genetic Algorithm Amplifier Biasing System (GAABS): Genetic Algorithm for Biasing on Differential Analog Amplifiers

Author: Whalen Sean
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2018
Field of study

Genetic Algorithm Amplifier Biasing System (GAABS) - Senior Project Analysis Summary of Functional Requirements This project integrates LTSpice with a python script that runs a genetic algorithm to bias a differential amplifier. The system biases the amplifier with 2 different voltages, the base voltage for the PNP BJTs of the active loads and a voltage controlling the current of the current sink. The project runs via a python script, gets data from LTSpice’s command line call, and iteratively runs until the system is biased to achieve the greatest gain on an arbitrary input voltage. Primary Constraints Some of the main challenges associated with this project are going to be the getting the genetic algorithm to work consistently and getting LTSpice to integrate well with command line. The genetic algorithm, though controlled, will have a good deal of randomness involved with converging to a certain gain value. A strong genetic algorithm should be able to converge to the same value every time and should be designed accordingly. Having never experienced using LTSpice via command line, but it shouldn’t be too difficult to call. Collecting data from the simulation will be challenging, but ideally there would be resources for help on that portion. Economic The original estimated cost for components is

0, as all the software should be open source and free to download and access to a computer should be considered free. There is no hardware, as it’s all simulation, so there is nothing there to be purchased. Bill of Materials Item Cost Bootcamp Application

0 Python 2.7

0 LTSpice

0 Total

0 The total did end up being

0 as anticipated. Everything that could be downloaded was free to download. The original time for development at the start of the project was anticipated being 100+ hours. Given the need to integrate everything and work to get the genetic algorithm working well, 100 hours seemed reasonable. In the end, it did end up taking roughly 80 hours. Having to try different approaches to the problem took up a lot of time and tweaking the genetic algorithm (and running the tests) took a long time, but the integration was easy to set up. The integration being easy shaved a large chunk of time off the projected time to complete the project. Manufacturing Information This code is open source on GitHub, and won’t be manufactured on a commercial basis. Environmental There are no environmental impacts associated with manufacturing. The only potential impact on the environment of this project would be the heat generated by a computer running the script. The script takes up to 30+ minutes to run, and it is somewhat intensive in terms of computing power; this would generate heat from the computer running it, and heat from computers cannot be neglected in terms of their effect on global warming. However, the heat that would be generated by 1 computer should be considered negligible, as there are much greater contributors. Manufacturability As stated before, there are no issue with manufacturing this project because it’s open source. Everything needed to run the code can be found online for free download, and the script can be taken from online. Sustainability The code runs on Python 2.7 and the current version of LTSpice. It should have no issue running on later versions of Python and LTSpice, so long as there are no drastic changes. The project is on the internet, and so it will be sustainably existing as long as it’s not taken down by GitHub. Upgrades that would improve the design of the project include running more children per generation in simulation at once to speed up runtime and taking more generations to come to the best bias voltages to make it more accurate. Ethical There is no ethical implication to the use or design of this project. Health and Safety Other than long term computer use’s impact on a user, there are no health and safety concerns with this project whatsoever. Social and Political There are no social and political implications to the use or design of this project. Development During the development of this project, I had to learn how to use Python on a much deeper level. My CPE 101 class was in Python, but that was winter quarter of 2015, and this project took place in the winter and spring of 2018. I remembered very little, but I got to see a lot of the functionality of python in terms of it being a great language for running scripts to work on a variety of applications across platforms. I had to research a lot on genetic algorithms and how to implement them, as that was a huge portion of this project

DigitalCommons@CalPoly

Verbal mediation of visual memory on the Continuous Visual Memory Test

Author: Whalen Sean P.
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/1999
Field of study

University of Montana

Recommended from our members

The Epstein-Barr Virus Episome Maneuvers between Nuclear Chromatin Compartments during Reactivation.

Author: Fernandez Samantha G
McBride Alison A
Miranda Jj L
Moquin Stephanie A
Pollard Katherine S
Thomas Sean
Warburton Alix
Whalen Sean
Publication venue: eScholarship, University of California
Publication date: 01/02/2018
Field of study

The human genome is structurally organized in three-dimensional space to facilitate functional partitioning of transcription. We learned that the latent episome of the human Epstein-Barr virus (EBV) preferentially associates with gene-poor chromosomes and avoids gene-rich chromosomes. Kaposi's sarcoma-associated herpesvirus behaves similarly, but human papillomavirus does not. Contacts on the EBV side localize to OriP, the latent origin of replication. This genetic element and the EBNA1 protein that binds there are sufficient to reconstitute chromosome association preferences of the entire episome. Contacts on the human side localize to gene-poor and AT-rich regions of chromatin distant from transcription start sites. Upon reactivation from latency, however, the episome moves away from repressive heterochromatin and toward active euchromatin. Our work adds three-dimensional relocalization to the molecular events that occur during reactivation. Involvement of myriad interchromosomal associations also suggests a role for this type of long-range association in gene regulation.IMPORTANCE The human genome is structurally organized in three-dimensional space, and this structure functionally affects transcriptional activity. We set out to investigate whether a double-stranded DNA virus, Epstein-Barr virus (EBV), uses mechanisms similar to those of the human genome to regulate transcription. We found that the EBV genome associates with repressive compartments of the nucleus during latency and with active compartments during reactivation. This study advances our knowledge of the EBV life cycle, adding three-dimensional relocalization as a novel component to the molecular events that occur during reactivation. Furthermore, the data add to our understanding of nuclear compartments, showing that disperse interchromosomal interactions may be important for regulating transcription

eScholarship - University of California

Observability and Controllability of Nonlinear Networks: The Role of Symmetry

Author: Brennan Sean N.
Sauer Timothy D.
Schiff Steven J.
Whalen Andrew J.
Publication venue: 'American Physical Society (APS)'
Publication date: 06/10/2014
Field of study

Observability and controllability are essential concepts to the design of predictive observer models and feedback controllers of networked systems. For example, noncontrollable mathematical models of real systems have subspaces that influence model behavior, but cannot be controlled by an input. Such subspaces can be difficult to determine in complex nonlinear networks. Since almost all of the present theory was developed for linear networks without symmetries, here we present a numerical and group representational framework, to quantify the observability and controllability of nonlinear networks with explicit symmetries that shows the connection between symmetries and nonlinear measures of observability and controllability. We numerically observe and theoretically predict that not all symmetries have the same effect on network observation and control. Our analysis shows that the presence of symmetry in a network may decrease observability and controllability, although networks containing only rotational symmetries remain controllable and observable. These results alter our view of the nature of observability and controllability in complex networks, change our understanding of structural controllability, and affect the design of mathematical models to observe and control such networks.Comment: 19 pages, 9 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Model Aggregation for Distributed Content Anomaly Detection

Author: Boggs Nathaniel Gordon
Stolfo Salvatore
Whalen Sean
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic deviates from this model. Content anomaly detection (CAD) is a variant of this approach that models the payloads of such traffic instead of higher level attributes. Zero-day attacks then appear as outliers to properly trained CAD sensors. In the past, CAD was unsuited to cloud environments due to the relative overhead of content inspection and the dynamic routing of content paths to geographically diverse sites. We challenge this notion and introduce new methods for efficiently aggregating content models to enable scalable CAD in dynamically-pathed environments such as the cloud. These methods eliminate the need to exchange raw content, drastically reduce network and CPU overhead, and offer varying levels of content privacy. We perform a comparative analysis of our methods using Random Forest, Logistic Regression, and Bloom Filter-based classifiers for operation in the cloud or other distributed settings such as wireless sensor networks. We find that content model aggregation offers statistically significant improvements over non-aggregate models with minimal overhead, and that distributed and non-distributed CAD have statistically indistinguishable performance. Thus, these methods enable the practical deployment of accurate CAD sensors in a distributed attack detection infrastructure

Crossref

Columbia University Academic Commons

The counting stroop: An interference task specialized for functional neuroimaging—validation study with functional MRI

Author: Bruce R. Rosen
George Bush
Michael A. Jenike
Paul J. Whalen
Scott L. Rauch
Sean C. McInerney
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Improving Critical Speed Calculations Using Flexible Bearing Support FRF Compliance Data.

Author: Franklin Sean D.
Nicholas John C.
Whalen John K.
Publication venue: 'Biophysical Society of Japan'
Publication date: 01/01/1986
Field of study

LecturePg. 69-78The importance of including flexible supports in rotordynamic analyses is discussed. Various methods of including the support in rotordynamic calculations are reviewed. A method is described in which actual compliance frequency response function, FRF, data are used directly in a rotordynamic forced response computer program to accurately predict a steam turbine rotor's critical speed. The flexible support model is described as two single degree of freedom, SDOF, spring-mass-damper systems per bearing support. The methodology of acquiring the FRF data via impact hammer testing is described, and the equations are summarized that incorporate the FRF data into the flexible support model. Three flexible support models of increasing sophistication are used to analytically predict the rotor and support resonances. These results are compared to the actual steam turbine speed-amplitude plots. Modelling the support as many speed dependent SDOF systems accurately predicts the location of the rotor's first critical speed and also the split critical peaks and several support resonance speeds

Texas A&M Repository